Compute Vision papers and notes |
home
Traditional Methods
Lowe - 1999 - Object recognition from local scale-invariant features
: SIFT
Dalal and Triggs - 2005 - Histograms of Oriented Gradients for Human Detection
: HOG
Viola and Jones - Robust Real-time Object Detection
: Viola-Jones
Image Classification
LeCun et al - 1998 - Gradient-Based Learning Applied to Document Recognition
: LeNet
Krizhevsky et al - ImageNet Classification with Deep Convolutional Neural Networks
: AlexNet
Simonyan and Zisserman - 2015 - Very Deep Convolutional Networks for Large-Scale Image Recognition
: VGG
He et al - 2015 - Deep Residual Learning for Image Recognition
: ResNet
He et al - 2016 - Identity Mappings in Deep Residual Networks
Szegedy et al - 2015 - Rethinking the Inception Architecture for Computer Vision
: GoogLeNet
Howard et al - 2017 - MobileNets Efficient Convolutional Neural Networks for Mobile Vision Applications
: MobileNet
Zhang et al - 2017 - ShuffleNet An Extremely Efficient Convolutional Network for Mobile Devices
: ShuffleNet
Huang et al - 2018 - Densely Connected Convolutional Networks
: DenseNet
Hu et al - 2019 - Squeeze-and-Excitation Networks|Hu et al - 2019 - Squeeze-and-Excitation Networks|Hu et al - 2019 - Squeeze-and-Excitation Networks
: SENet
Tan and Le - 2020 - EfficientNet Rethinking Model Scaling for Convolutional Neural Networks
: EfficientNet
Object Detection
Girshick et al - 2014 - Rich feature hierarchies for accurate object detection and semantic segmentation
: R-CNN
Girshick - 2015 - Fast R-CNN
: Fast R-CNN
Ren et al - 2016 - Faster R-CNN Towards Real-Time Object Detection with Region Proposal Networks
: Faster R-CNN
Redmon et al - 2016 - You Only Look Once Unified, Real-Time Object Detetion
: YOLOv1
Redmon and Farhadi - 2016 - YOLO9000 Better, Faster, Stronger
: YOLOv2
Liu et al - 2016 - SSD Single Shot MultiBox Detector
: SSD
Lin et al - 2017 - Feature Pyramid Networks for Object Detection
: FPN
Cai and Vasconcelos - 2017 - Cascade R-CNN Delving into High Quality Object Detection
: Cascade R-CNN
Redmon and Farhadi - YOLOv3 An Incremental Improvement
: YOLOv3
He et al - 2018 - Mask R-CNN
: Mask R-CNN
Bochkovskiy et al - 2020 - YOLOv4 Optimal Speed and Accuracy of Object Detection
: YOLOv4
Vision Transformer
Dosovitskiy et al - 2021 - An Image is Worth 16x16 Words Transformers for Image Recognition at Scale
: ViT
Tolstikhin et al - 2021 - MLP-Mixer An all-MLP Architecture for Vision
: MLP-Mixer
Liu et al - 2021 - Swin Transformer Hierarchical Vision Transformer
: Swin Transformer
Carion et al - 2020 - End-to-End Object Detection with Transformers
: DETR
He et al - 2021 - Masked Autoencoders Are Scalable Vision Learners
: MAE
Radford et al - 2021 - Learning Transferable Visual Models From Natural Language supervision
: CLIP
GAN
Goodfellow et al - Generative Adversarial Nets
: GAN